System and method for gesture interface

ABSTRACT

A method for determining a gesture includes determining a change in a background of an image from a plurality of images, determining a object in the image, determining a trajectory of the object through the plurality of images, and classifying a gesture according to the trajectory of the object.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to computer interfaces, and moreparticularly to a real-time gesture interface for use in medicalvisualization workstations.

[0003] 2. Discussion of the Prior Art

[0004] In many environments, traditional hands-on user interfaces, forexample, a mouse and keyboard, for interacting with a computer are notpractical. One example of such an environment is an operating theater(OT) where there is a need for strict sterility. A surgeon, andeverything coming into contact with his/her hands must be sterile.Therefore, the mouse and keyboard maybe excluded from consideration asan interface because they may not be sterilized.

[0005] A computer may be used in the OT for medical imaging. Theinteraction can include commands to display different images, scrollingthrough a set of two-dimensional (2D) images, changing imagingparameters (window/level), etc. With advances in technology, there is agrowing demand for three-dimensional (3D) visualizations. Theinteraction and manipulation of 3D models is intrinsically morecomplicated than for 2D models even if a mouse and keyboard can be used,because the commands may not be intuitive when working in 3D. Examplesof commands in a 3D medical data visualization environment includerotations and translations including zoom.

[0006] Areas of human-machine interaction in the OT include, forexample, voice recognition and gesture recognition. There are severalcommercially voice recognition systems available. In the context of theOT, their advantage is that the surgeon can continue an activity, forexample, a suture, while commanding the imaging system. However, thedisadvantage is that the surgeon needs to mentally translate geometricinformation into language: e.g., “turn right”, “zoom in”, “stop”. Thesecommands need to include some type of qualitative information.Therefore, it can be complicated and tiresome to achieve a specific 3Dorientation. Other problems related to voice recognition are that it mayfail in a noisy environment, and the system may need to be trained toeach user.

[0007] Researchers have attempted to develop systems that can provide anatural, intuitive human-machine interface. Efforts have been focused onthe development of interfaces without mouse or device basedinteractions. In the OT, the need for sterility warrants the use ofnovel schemes for human-machine interfaces for the doctor toissue.commands to a medical imaging workstation.

[0008] Gesture recognition includes two sequential tasks, featuredetection/extraction and pattern recognition/classification. A review ofvisual interpretation of hand gestures can be found in V. I. Pavlovic,R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures forhuman-computer interaction, A Review”, IEEE Transactions on PatternAnalysis and Machine Intelligence, 19(7):677-695, July 1997.

[0009] For feature detection/extraction, applications may use color todetect human skin. An advantage of a color-based technique is real-timeperformance. However, the variability of skin color in varying lightingconditions can lead to false detection. Some applications use motion tolocalize the gesture. A drawback of a motion cue approach is thatassumptions maybe needed to make the system operable, e.g., a stationarybackground and one active gesturer. Other methods, such as usingdata-gloves/sensors to collect 3D data, may not be suitable for ahuman-machine interface because they are not natural.

[0010] For pattern recognition and classification, several techniqueshave been proposed. Hidden Markov Model (HMM) is one method. HMM can beused for, for example, the recognition of American Sign Language (ASL).One approach uses motion-energy images (MEI) and motion-history images(MHI) to recognize gestural actions. Computational simplicity is themain advantage of such a temporal template approach. However, motion ofunrelated objects may be present in MHI.

[0011] Neural networks are another tool used for recognition. Inparticular, a time-delay neural network (TDNN) has demonstrated thecapability to classify spatio-temporal signals. TDNN can also be usedfor hand gesture recognition. However, TDNN may not be suitable for someenvironments such as an OT, wherein the background can include elementscontributing to clutter.

[0012] Therefore, a need exists for a system and method for a real-timeinterface for medical workstations.

SUMMARY OF THE INVENTION

[0013] According to an embodiment of the present invention, a method isprovided for determining a gesture. The method includes determining achange in a background of an image from a plurality of images, anddetermining an object in the image. The method further includesdetermining a trajectory of the object through the plurality of images,and classifying a gesture according to the trajectory of the object.

[0014] Determining the change in the background includes determining agradient intensity map for the background from a plurality of images,determining a gradient intensity map for the current image, anddetermining, for a plurality of pixels, a difference between thegradient intensity map and the gradient intensity map for thebackground. Determining the change in the background further includesdetermining a comparison between the difference and a threshold, anddetermining a pixel to be a background pixel according to thecomparison.

[0015] The object includes a user's hand.

[0016] Determining the object in the image includes obtaining anormalized color representation for a plurality of colors in each image,determining from training images an estimate of a probabilitydistribution of normalized color values for an object class, anddetermining, for each pixel, a likelihood according to an estimatedprobability density of normalized color values for the object class.

[0017] Determining the trajectory of the object through the plurality ofimages further comprises determining, for each pixel, a temporallikelihood across a plurality of images, and determining a plurality ofmoments according to the temporal likelihoods.

[0018] Determining the trajectory includes determining a difference in asize of the object over a pre-determined time period, determining aplurality of angles between a plurality of lines connecting successivecontroids over the time period, and determining a feature vectoraccording to the angles and lines.

[0019] The method further includes classifying the feature vectoraccording to a time-delay neural network, wherein a feature is of afixed length.

[0020] Classifying the gesture includes determining a reference point,determining a correspondence between the trajectory and the referencepoint, and classifying the trajectory according to one of a plurality ofcommands.

[0021] According to an embodiment of the present invention, a method isprovided for determining a trajectory of a hand through a plurality ofimages. The method includes detecting a reference point, updating thereference point as the reference point is varied, and detecting a firsttranslation of the hand through the plurality of images. The methodfurther includes detecting a second translation through the plurality ofimages, determining a gesture according a vote, and determining whetherthe gesture is a valid gesture command.

[0022] The reference point is not interpreted as a gesture command. Thereference point is characterized by hand size and a location of acentroid of the hand in each image.

[0023] The first translation is one of a forward and a backwardtranslation, wherein the first translation is characterized by a largechange in hand size and a relatively small change in a centroid of thehand. The second translation is one of a left, a right, an up and a downtranslation.

[0024] Detecting the second translation includes determining anormalized vector between two centroids c_(t) and c_(t−1) as a featurevector, wherein there are three output patterns. The three outputpatterns are a vertical movement, a horizontal movement, and an unknown.The method further includes comparing the reference point to a centroidupon determining the translation to be a vertical or a horizontaltranslation, and testing an input pattern upon determining thetranslation to be an unknown translation. Testing an input patternfurther comprises detecting a circular movement, wherein an anglebetween vector c_(t)c_(t−1) and vector C_(t−1)C_(t−2) is determined asthe feature vector.

[0025] The valid gesture is performed continually for a predeterminedtime.

[0026] According to an embodiment of the present invention, a programstorage device is provided readable by machine, tangibly embodying aprogram of instructions executable by the machine to perform methodsteps for determining a gesture. The method includes determining achange in a background of an image from a plurality of images,determining an object in the image, determining a trajectory of theobject through the plurality of images, and classifying a gestureaccording to the trajectory of the object.

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] Preferred embodiments of the present invention will be describedbelow in more detail, with reference to the accompanying drawings:

[0028]FIG. 1 is a screenshot of the Fly-through visualization toolaccording to an embodiment of the present invention;

[0029]FIG. 2 is an image showing a user's operating hand in an imageaccording to an embodiment of the present invention;

[0030]FIG. 3 shows modules of the gesture interface for medicalworkstations according to an embodiment of the present invention;

[0031]FIG. 4 shows a hierarchy of TDNN based classifier according to anembodiment of the present invention;

[0032]FIGS. 5a-d show an example of a method of discriminating movementsaccording to an embodiment of the present invention; and

[0033]FIGS. 6a-h show an example of a method of determining a handgesture wherein the hand is not held stationary according to anembodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0034] A system and method for a computer interface detects changes in abackground portion of an image, classifies an object of interest basedon color properties in the image, and extracts and classifies a gesturefeature. The resulting classification results can be used to control a3D visualization system for medical image data, for example,Fly-Through. This system and method can achieve real-time performance incluttered background settings. Further, the system and method can beimplemented in conjunction with a medical image visualization system ormethod.

[0035] 3D Virtuoso is a postprocessing workstation from Siemens that hasmany 3D tools. One of these tools, Fly-Through, is a dedicated tool forVirtual Endoscopy Simulation. Besides generic 3D rendering capabilities,it has a viewpoint that shows a view of a cavity, for example, a tracheaor colon, from a viewpoint inside the body, the virtual endoscope. FIG.1, is a screenshot of a visualization tool, in this case, Fly-Through,showing a global view of the data 101 as well as a virtual endoscopeview 102 from a user defined vantage point.

[0036] According to an embodiment of the present invention, the systemand method can imitate the manipulation of an endoscope. The system andmethod allow the user to, for example, push, pull, pivot and turn avirtual endoscope. These and other commands can provide gesturerecognition. Gestures can include, for example, degrees of translationsincluding left, right, up, down, forward, and backward, and circularmovements including clockwise and counterclockwise. Circular movementsare viewed as rotations in the gesture interface. As FIG. 2 shows, acamera is fixed in front of a user's hand 201. A valid gesture commandneeds to be performed continually for a predetermined time to initializethe command. Repetition of a gesture, e.g., more than two times, can beconsidered as a valid command. For example, to drive the virtualendoscope to the left, the user may wave his hand from right to left,from left to right, and continue this movement until the virtualendoscope moves to the desired position. Thus, a high recognition rate,e.g., 95%, using hand gestures can be obtained.

[0037] The design of gestures can be important to a gesture interface.It may not be reasonable to ask a user to keep his/her hand in thevisual field of the camera at all times. Also, meaningless handmovements need to be disregarded by the human-machine interface. Forexample, after performing a gesture, the user may want to move his/herhand out of the camera's field of view to do other operations, e.g., tomake an incision. These kinds of hand movements are allowed and the HMIneeds to ignore them. After the user initializes a valid gesturecommand, the system executes the command so long as the gesturecontinues. For example, the longer a gesture is performed, the largermovement the virtual endoscope makes in the case of Fly-Through.

[0038] Consider two valid gesture commands, move left and move right.Both commands may need the user's hand be waved horizontally and theuser can continue this movement as many times as desired. Given noinformation about where the movement starts, there maybe no way todistinguish between the motion trajectory patterns, e.g., left or rightwaves. Similar ambiguities can occur when other translations areperformed. For this reason, the system and method needs to know ordetermine a starting point for a gesture command. According to anembodiment of the present invention, by holding the hand stationarybefore performing a new gesture, the stationary point becomes areference point. The reference point is used to distinguish among, forexample, moving left or right, up or down, and forward or backward.

[0039] A gesture command can include various gestures, for example,using the representation of circular movements of a finger or rotatingthe hand to cause the view to rotate. In this example, drawing circlesmay be easier for the user than rotating the hand.

[0040] Referring to FIG. 3, the method includes detecting changes in thebackground of a video image in a sequence 301. The method can detectskin-tone of a user according to a Gaussian mixture model 302. A motiontrajectory of, for example, the user's hand, can be extracted from thevideo sequence 303. TDNN based motion pattern classification 304 can beused to classify a hand gesture. The system sends the classificationresults to, for example, the Fly-Through visualization system.

[0041] The system and method can detect changes in a background bydetermining an intensity of each image from video stream. To eliminatenoise, a Gaussian filter can be applied to each image. A gradient map ofpixel intensity can be determined. After determining the gradient map ofa current image frame, the gradient may is compared with the learnedbackground gradient map. If a given pixel differs less than a thresholdbetween these two gradient maps, the pixel is determined to be abackground pixel, and can be marked accordingly. A pre-determinedthreshold can be used. One with ordinary skill in the art wouldappreciate, in light of the present invention, that additional methodsfor selecting the threshold exist, for example, through knowledge ofsensor characteristics or through normal illumination changes allowed inthe background. According to an embodiment of the present invention thelargest area of connected background pixels can be treated as backgroundregion.

[0042] According to an embodiment of the present invention, skin-tonedetection can be based on a normalized color model using a learnedmixture of Gaussian distributions. The use of normalized colors$\left( {\frac{r}{r + g + b},\frac{g}{r + g + b}} \right)$

[0043] can reduce the variance of skin color in an image. Also, it hasbeen shown that skin color can be modeled by a multivariate Gaussian inHS (hue and saturation) space under certain lighting conditions. Ingeneral, for Gaussian mixture model with n components, the conditionalprobability density for an observation χ of dimensionality:$\begin{matrix}{{p\left( \chi \middle| \theta \right)} = {\sum\limits_{i = 1}^{n}{\pi_{i}\frac{e^{{{- 1}/2}{({\chi - \mu_{1}})}T{\sum_{i}^{- 1}{({\chi - \mu_{i}})}}}}{\left. \left( {2\pi} \right)^{\frac{d}{2}} \middle| \left. \sum_{i} \right|^{\frac{1}{2}} \right.}}}} & (1)\end{matrix}$

[0044] where mixing parameter π_(i) corresponds to the prior probabilityof mixture component i and each component is a Gaussian with mean vectorμ_(i) and covariance matrix Σ_(i). According to an embodiment of thepresent invention, skin colors can be modeled in the normalized RG (redand green) space. With learned mean vectors μ, covariance matrix Σ, andknown prior π, a likelihood is determined for each pixel of the imageaccording to Equation (1) above. According to one embodiment of thepresent invention, the likelihood of a pixel I(x, y) can be defined as:$\begin{matrix}{{L\left( {x,y} \right)} = \left\{ \begin{matrix}{\quad {p\left( \chi \middle| \theta \right)}} & {\quad {{{if}\quad {I\left( {x,y} \right)}\quad \varepsilon \quad {foreground}\quad {pixel}};}} \\{\quad 0} & {\quad {{otherwise}.}}\end{matrix} \right.} & (2)\end{matrix}$

[0045] if I (x,y) ∈foreground pixel; otherwise

[0046] For a foreground pixel with its normalized color observation χ,the likelihood of the pixel is defined as its estimated density. Forbackground pixels, the likelihood values are set to 0. A possible methodto select skin pixels is to apply a simple threshold to Equation (2). Ifthe likelihood of a pixel is larger than the threshold, the pixel isthen classified as a skin pixel. And the largest skin area of the imageis often viewed as the detected skin object.

[0047] The trajectory of the centroid of the detected skin object isoften used as the motion trajectory of the object. However, it has beendetermined that there are many objects having skin-like color in anoffice environment. For example, a wooden bookshelf or a poster on awall may be misclassified as a skin-like object. Therefore, the systemand method attempts to eliminate background pixels as discussed inabove. Besides, the skin objects (user's hand and probably the arm) aresometimes split up into two or more blobs. Other skin regions such asface may also appear in the view of the camera. These problems togetherwith non-uniform illumination make the centroid vary dramatically andleads to false detections. For these reasons, a stable motion trajectoryis hard to obtain by just finding the largest skin area. To handle theseproblems, a temporal likelihood can be defined as L^(t)(x, y, t) of eachpixel I(x, y) as:

L ^(t)(x, y, t)=λL(x, y)+(1−λ)L ^(t)(x, y, t−1)  (3)

[0048] where λ is a decay factor. Experiments show that a value of λequal to 0.5 can be used.

[0049] To select skin pixels, a threshold δ, is applied to the temporallikelihood L^(t)(x, y, t) instead of likelihood L(x, y) of each pixel.Thus, the thresholded temporal likelihood of a pixel can be defined as:$\begin{matrix}{{L_{\delta}^{t}\left( {x,y,t} \right)} = \left\{ \begin{matrix}{\quad {L^{t}\left( {x,y,t} \right)}} & {\quad {{{{if}\quad {L^{t}\left( {x,y,t} \right)}} > \delta};}} \\{\quad 0} & {\quad {{otherwise}.}}\end{matrix} \right.} & (4)\end{matrix}$

[0050] The moments of the image can be determined as follows:$\begin{matrix}{M_{00}^{t} = {\int{\int{{L_{\delta}^{t}\left( {x,y,t} \right)}{x}{y}}}}} & (5) \\{M_{10}^{t} = \frac{\int{\int{x\quad {L_{\delta}^{t}\left( {x,y,t} \right)}{x}{y}}}}{M_{00}^{t}}} & (6) \\{M_{01}^{t} = \frac{\int{\int{y\quad {L_{\delta}^{t}\left( {x,y,t} \right)}{x}{y}}}}{M_{00}^{t}}} & (7)\end{matrix}$

[0051] According to an embodiment of the present invention, M₀₀ ^(t) isviewed as the size of skin pixels. And (M₁₀ ^(t), M₀₁ ^(t)) is taken toform the motion trajectory. The present invention precisely classifiesthe user gesture. The system and method provide a reasonable solution tothe extraction of trajectories of hand motions.

[0052] Recognition of a user's hand motion patterns can be accomplishedusing TDNN according to an embodiment of the present invention.Experiments show that TDNN has good performance on motion patternclassification. As shown by experiments, TDNN has better performance ifthe number of output labels was kept small. Another advantage is thatsmall number of output labels make networks simple and saves time atnetwork training stage. For these reasons user's gestures are testedhierarchically. Further, TDNN applied hierarchically, has beendetermined to be suitable for the classification of the eight motionpatterns described above. For instance, left movement and right movementhave the common motion pattern of horizontal hand movement. Thus, oncehorizontal movement is detected, the range of the motion is comparedwith the reference point to differentiate these two gestures.

[0053] Without introducing the reference point, the neural network hasdifficulty in discriminating the gestures. The input patterns of theTDNNs have a fixed input length. Since classification is to be performedin real-time as the user moves his hand, the motion patterns areclassified along windows in time. At time t, the centroid ct is obtainedas described with respect to motion trajectory extraction.

[0054] Suppose the length of an input pattern is w, the feature vectors{v_(t−w+1), v_(y−w+2), . . . , v_(t)} from {c_(t−w), c_(t−w+1), . . . ,c_(t)} are extracted to form a TDNN input pattern. When the maximumresponse from the network is relatively small, as compared with otherlabel responses, the input pattern is classified as an unknown. Somefalse detections or unknowns are inevitable. False detection can occurwhen the trajectory of a translation are similar to an arc of a circle.To minimize false detection and obtain stable performance, a fixednumber of past results are checked. When more than half of these pastresults indicate the same output pattern, this output pattern isdetermined to be a final result. This method has been used tosuccessfully obtain a reliable recognition rate.

[0055]FIG. 4 shows a hierarchy of the motion pattern classifieraccording to an embodiment of the present invention. For the detectionof a reference point, when a user keeps his/her hand stationary 401 fora period of time, that is, both size and centroid are almost the samealong some time interval, the method detects updates a reference point402. The reference point will not be interpreted as a gesture command bythe system and method.

[0056] The method detects forward/backward translations 403. The skinsize information obtained from Equation (5) can be used to determine atranslation. Since the movement of forward or backward is roughly alongthe Z-axis of camera, these two translations are characterized by adramatic change of skin size and subtle change of the centroid of thedetected skin region. The estimated size of the hand is compared to thesize of the hand when the reference point was initialized todifferentiate between a forward and a backward movement.

[0057] Further, the method can detect left/right/up/down translations405. The normalized vector between centroids ct and c_(t−1) is computedas the feature vector. There are three output patterns: verticalmovement, horizontal movement, and unknown. To determine whether amovement is vertical or horizontal, the centroid of the reference pointis compared to the centroid currently estimated in the frame. If theresult is unknown, e.g., can be a circular movement, the input patternis tested at the next stage.

[0058] For the detection of circular movements, the angle between vectorc_(t)c_(t−1) and vector c_(t−1)c_(t−2) is computed as the feature vector406. This feature can distinguish between clockwise and counterclockwisecircular movements. As expected, users can draw circles from anyposition. In particular, a spiral would be classified as one of thecircular movements instead of a translation. Referring to FIG. 4, themethod can use a voting method 407 to check past results to formmeaningful output, the system decreases the possibility of falseclassification. The method determines whether a given gesture is a validgesture command 408. A valid gestures needs to be performed continuallyin some time interval to initialize the command.

[0059]FIGS. 5 and 6 show some examples of our experimental results. Ineach image, the black region, e.g., 501, is viewed as background. Thebounding box, e.g., 502 (highlighted in white in FIG. 5b for clarity),of each image indicates the largest skin area as determined bythresholded likelihood, Equation (2). Note that bounding boxes are onlyused for display. The arrow(s), e.g., 503, on each bounding box show theclassification result. A bounding box with no arrow, for example, as inFIGS. 5a-c, on it means that the gesture is an unknown pattern, or thatno movement has occurred, or insufficient data has been collected.Because we classify motion patterns along windows in time, there may besome delay after a gesture is initialized (data is not sufficient forsystem to make a global decision).

[0060] According to an embodiment of the present invention,unintentional movements can be checked using a voting method 407 tocheck past results to form meaningful outputs, thus, decreasing thepossibility of false classification. Further, a user can change gestureswithout holding his/her hand stationary. For any two gestures, which canbe distinguished without new reference point, for example, turn left andthen up, or a translation to a circular movement, the user does not needto make hand stationary in between. In tests the system demonstrates areliable and accurate performance.

[0061] A need exists for an intuitive gesture interface for medicalimaging workstations. The present invention proposes a real-time systemand method that recognizes gestures to drive a virtual endoscopy system.The system and method can classify user's gesture as one of eightdefined motion patterns: turn left/right, rotateclockwise/counterclockwise, move up/down, and move in depth in/out.Detecting composite gesture commands on a two-dimension plane need moremodification. Besides, current work takes advantage of the fact thatsome translation patterns are performed along the Z-axis of camera. Withonly one camera, designing a six degree-of-freedom gesture interfacewith more flexible camera position needs more research. The system andmethod have been tested in a laboratory setting and further work isneeded to improve the system and to evaluate it in a clinical setting.

[0062] Having described embodiments for a system and method forreal-time gesture interface for medical workstations, it is noted thatmodifications and variations can be made by persons skilled in the artin light of the above teachings. It is therefore to be understood thatchanges may be made in the particular embodiments of the inventiondisclosed which are within the scope and spirit of the invention asdefined by the appended claims. Having thus described the invention withthe details and particularity required by the patent laws, what isclaimed and desired protected by Letters Patent is set forth in theappended claims.

What is claimed is:
 1. A method for determining a gesture comprising thesteps of: determining a change in a background of an image from aplurality of images; determining an object in the image; determining atrajectory of the object through the plurality of images; andclassifying a gesture according to the trajectory of the object.
 2. Themethod of claim 1, wherein the step of determining the change in thebackground further comprises the steps of: determining a gradientintensity map for the background from a plurality of images; determininga gradient intensity map for the current image; determining, for aplurality of pixels, a difference between the gradient intensity map andthe gradient intensity map for the background; determining a comparisonbetween the difference and a threshold; and determining a pixel to be abackground pixel according to the comparison.
 3. The method of claim 1,wherein the object includes a user's hand.
 4. The method of claim 1,wherein the step of determining the object in the image furthercomprises the steps of: obtaining a normalized color representation fora plurality of colors in each image; determining from training images anestimate of a probability distribution of normalized color values for anobject class; and determining, for each pixel, a likelihood according toan estimated probability density of normalized color values for theobject class.
 5. The method of claim 1, wherein the step of determiningthe trajectory of the object through the plurality of images furthercomprises the steps of: determining, for each pixel, a temporallikelihood across a plurality of images; and determining a plurality ofmoments according to the temporal likelihoods.
 6. The method of claim 1,wherein the step of determining the trajectory further comprises thesteps of: determining a difference in a size of the object over apredetermined time period; determining a plurality of angles between aplurality of lines connecting successive controids over the time period;and determining a feature vector according to the angles and lines. 7.The method of claim 6, further comprising the step of classifying thefeature vector according to a time-delay neural network, wherein afeature is of a fixed length.
 8. The method of claim 1, wherein the stepof classifying the gesture further comprises the steps of: determining areference point; determining a correspondence between the trajectory andthe reference point; and classifying the trajectory according to one ofa plurality of commands.
 9. A method for determining a trajectory of ahand through a plurality of images comprising the steps of: detecting areference point; updating the reference point as the reference point isvaried; detecting a first translation of the hand through the pluralityof images; detecting a second translation through the plurality ofimages; determining a gesture according a vote; and determining whetherthe gesture is a valid gesture command.
 10. The method of claim 9,wherein the reference point is not interpreted as a gesture command. 11.The method of claim 9, wherein the reference point is characterized byhand size and a location of a centroid of the hand in each image. 12.The method of claim 9, wherein the first translation is one of a forwardand a backward translation, wherein the first translation ischaracterized by a large change in hand size and a relatively smallchange in a centroid of the hand.
 13. The method of claim 9, wherein thesecond translation is one of a left, a right, an up and a downtranslation.
 14. The method of claim 9, wherein the step of detectingthe second translation further comprises the step of determining anormalized vector between two centroids c_(t) and c_(t−1) as a featurevector, wherein there are three output patterns.
 15. The method of claim14, wherein the three output patterns are a vertical movement, ahorizontal movement, and an unknown, the method further comprising thesteps of: comparing the reference point to a centroid upon determiningthe translation to be a vertical or a horizontal translation; andtesting an input pattern upon determining the translation to be anunknown translation.
 16. The method of claim 15, wherein the step oftesting an input pattern further comprises the steps of detecting acircular movement, wherein an angle between vector c_(t)c_(t−1) andvector c_(t−1)c_(t−2) is determined as the feature vector.
 17. Themethod of claim 9, wherein the valid gesture is performed continuallyfor a predetermined time.
 18. A program storage device readable bymachine, tangibly embodying a program of instructions executable by themachine to perform method steps for determining a gesture, the methodsteps comprising: determining a change in a background of an image froma plurality of images; determining an object in the image; determining atrajectory of the object through the plurality of images; andclassifying a gesture according to the trajectory of the object.
 19. Themethod of claim 18, wherein the step of determining the change in thebackground further comprises the steps of: determining a gradientintensity map for the background from a plurality of images; determininga gradient intensity map for the current image; determining, for aplurality of pixels, a difference between the gradient intensity map andthe gradient intensity map for the background; determining a comparisonbetween the difference and a threshold; and determining a pixel to be abackground pixel according to the comparison.
 20. The method of claim18, wherein the object includes a user's hand.
 21. The method of claim18, wherein the step of determining the object in the image furthercomprises the steps of: obtaining a normalized color representation fora plurality of colors in each image; determining from training images anestimate of a probability distribution of normalized color values for anobject class; and determining, for each pixel, a likelihood according toan estimated probability density of normalized color values for theobject class.
 22. The method of claim 18, wherein the step ofdetermining the trajectory of the object through the plurality of imagesfurther comprises the steps of: determining, for each pixel, a temporallikelihood across a plurality of images; and determining a plurality ofmoments according to the temporal likelihoods.
 23. The method of claim18, wherein the step of determining the trajectory further comprises thesteps of: determining a difference in a size of the object over apre-determined time period; determining a plurality of angles between aplurality of lines connecting successive controids over the time period;and determining a feature vector according to the angles and lines. 24.The method of claim 23, further comprising the step of classifying thefeature vector according to a time-delay neural network, wherein afeature is of a fixed length.
 25. The method of claim 18, wherein thestep of classifying the gesture further comprises the steps of:determining a reference point; determining a correspondence between thetrajectory and the reference point; and classifying the trajectoryaccording to one of a plurality of commands.